Team, Visitors, External Collaborators
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Ontology Mediated Query Answering

Participants : Jean-François Baget, Meghyn Bienvenu, Efstathios Delivorias, Michel Leclère, Marie-Laure Mugnier, Federico Ulliana.

Ontolology-mediated query answering (OMQA) is the issue of querying data while taking into account inferences enabled by ontological knowledge. This gives rise to knowledge bases, composed of a factbase (in database terms: an instance that contains incomplete data) and an ontology. Answers to queries are logically entailed from the knowledge base. Two families of formalisms for representing and reasoning with the ontological component have been considered in this context: description logics (DLs) and existential rules (aka Datalog+, or tuple-generating dependencies in database theory). Both frameworks correspond to fragments of first-order logic, which are incomparable in general but closely related in the context of OMQA: indeed, most DLs considered for OMQA, known as lighthweight DLs, are naturally translated into specific classes of existential rules. Importantly, the foundational work carried by the knowledge representation community led to the definition of several W3C standards for Semantic Web languages, namely the family of OWL 2 ontology languages, which can be used in combination with the RDF(S) Semantic Web language. This paradigm is also supported by commercial systems, such as Oracle.

Techniques for query answering under existential rules mostly rely on the two classical ways of processing rules, namely forward chaining and backward chaining. In forward chaining (also known as the chase in databases), the rules are applied to enrich the factbase and query answering can then be solved by evaluating the query against the saturated factbase (as in a classical database system, i.e., with forgetting the ontological knowledge). The backward chaining process can be divided into two steps: first, the query is rewritten using the rules into a first-order query (typically a union of conjunctive queries, but possibly a more compact form); then the rewritten query is evaluated against the factbase (again, as in a classical database system). Some classes of existential rules and lightweight description logics ensure the termination of the chase and/or query rewriting, but not all.

Revisiting the Chase

The interest for existential rules in the OMQA context brought again to light a fundamental tool in database theory, namely the chase. Several chase variants are known: they all yield logically equivalent results, but differ on how they handle redundancies possibly caused by the introduction of unknown individuals (often called nulls). Briefly, detecting redundancies leads to smaller saturated factbases, and prevents some infinite chase sequences, but it is costly. Given a chase variant, the (all-instances) chase termination problem takes as input a set of existential rules and asks if this set of rules ensures the termination of the chase for any factbase. It is well-known that this problem is undecidable for all known chase variants.

Hence, a crucial issue is whether chase termination becomes decidable for some known subclasses of existential rules. We considered linear existential rules, a simple yet important subclass of existential rules that generalizes inclusion dependencies. We showed the decidability of the (all-instances) chase termination problem on linear rules for three main chase variants, namely semi-oblivious, restricted and core chase. The restricted chase is the most used variant of the chase, however it is notoriously tricky to study because the order in which rule applications are performed matters. Indeed, for the same factbase, some restricted chase sequences may terminate, while others may not. To obtain these results, we introduced a novel approach based on so-called derivation trees and a single notion of forbidden pattern. Besides the theoretical interest of an unified approach and new proofs, we provided the first positive decidability results concerning the termination of the restricted chase, proving that chase termination on linear existential rules is decidable for both versions of the problem: Does every chase sequence terminate? Does some chase sequence terminate? [37] [27] (also to appear at ICDT 2019).

As part of Stathis Delivorias' PhD thesis, we considered the related problem of boundedness, which asks if a given set of existential rules is bounded, i.e., whether there is a predefined upper bound on the depth of the chase, independently from any factbase. This problem is already undecidable in the specific case of datalog rules (whose head has no existential variables). However, knowing that a set of rules is bounded for some chase variant does not help much in practice if the bound is unknown. Hence, we investigated the decidability of the k-boundedness problem, which asks whether a given set of rules is bounded by an integer k. We proved that k-boundedness is decidable for three main chase variants, namely the oblivious, semi-oblivious and restricted chase [23].

We investigated the combination of existential rules and answer set programming. The combination of the two formalisms requires to extend existential rules with nonmonotonic negation and to extend ASP with existential variables. To this aim, we introduced the syntax and semantics of existential non-monotonic rules using skolemization which join together the two frameworks. Building on our previous work published at ECAI and NMR, we presented syntactic conditions that ensure the termination of the chase for existential rules and discussed extension of these results in the nonmonotonic case [13].

Complexity of Ontology-Mediated Query Rewriting

Extending our previous work published at LICS, we carried out a systematic study on two fundamental problems in ontology-mediated query answering, in the context of the description logic OWL 2 QL. This dialect of the W3C standard ontology language OWL 2 is aimed towards efficient query answering on large data and ensures that every conjunctive ontology-mediated-query (OMQ) is rewritable into a first-order query. The first problem is the succintness of first-order rewritings of OMQs, which consists in understanding how difficult it is to built rewritings for queries in some OMQ class, and in particular to determine whether OMQs in the class have polynomial-size rewritings. The second problem is the complexity of OMQ answering. We classified OMQs according to the shape of their conjunctive queries (treewidth, the number of leaves) and the existential depth of their ontologies. For each of these classes, we determined the combined complexity of OMQ answering, and whether all OMQs in the class have polynomial-size first-order, positive existential and nonrecursive datalog rewritings. We obtained the succinctness results using hypergraph programs, a new computational model for Boolean functions, which makes it possible to connect the size of OMQ rewritings and circuit complexity [14].

Ontology-Based Data Access

In the above settings, data is supposed to be stored in a factbase built on the same vocabulary as the ontology. We now consider a more general setting, often called Ontology-Based Data Access (OBDA), in which data is stored in one or several databases, which were generally built independently from the ontology. Hence, the ontological level acts as a mediating level, and a new component, namely mappings, allows to transfer the answers to queries over the data into facts expressed in the ontology vocabulary. Mappings may be triggered to actually materialize the factbase, but such materialization may be not possible nor desirable, in which case the factbase remains virtual.

OBDA is the core setting we consider in the Inria Project Lab iCODA on data journalism (https://project.inria.fr/icoda/). As part of Maxime Buron's PhD thesis (co-supervision shared between CEDAR and GraphIK teams), we investigate several frameworks and query answering techniques in the OBDA setting. We consider the Semantic Web language RDFS to express the (possibly virtual) factbase and the core ontology, RDF rules that include classical RDF entailment rules but possibly richer ontological knowledge, expressive mappings (namely global-local-as-view mappings, whereas most existing work in the area is restricted to global-as-view mappings), and queries which, in the spirit of RDF, can interrogate both the ontology and the data at the same time. In particular, we proposed a new way of answering queries by a reduction to database query rewriting with views [21]. Software development and experiments are under progress.

We also pursued our work on inconsistency-tolerant query answering, revisiting existing complexity results obtained for OMQA in the wider context of OBDA, i.e., considering mappings. We formalized the problem and performed a detailed analysis of the data complexity of inconsistency-tolerant OBDA for ontologies formulated in data-tractable description logics, considering different semantics, notions of repairs and classes of GAV mappings. Our results imply that adding plain GAV mappings to the OMQA framework does not affect data complexity of inconsistency-tolerant query answering, but considering mappings with negated atoms leads to higher complexity [20].

Note that the latter work can also be seen as a contribution to maxi-consistent reasoning (see Section 7.2.2).